gSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
نویسندگان
چکیده
Decomposing a matrix $\mathbf {A}$ into lower {L}$ and an upper {U}$ , which is also known as LU decomposition, essential operation in numerical linear algebra. For sparse matrix, decomposition often introduces more nonzero entries the factors than original matrix. A symbolic factorization step needed to identify structures of matrices. Attracted by enormous potentials Graphics Processing Units (GPUs), array efforts have surged deploy various factorization steps except for symbolic factorization, best our knowledge, on GPUs. This article gSoFa first G PU-based s ymb o lic fa ctorization design with following three optimizations enable scalable nonsymmetric pattern matrices First, we introduce novel fine-grained parallel algorithm that well suited Single Instruction Multiple Thread (SIMT) architecture Second, tailor supernode detection SIMT friendly process strive balance workload, minimize communication saturate GPU computing resources during detection. Third, three-pronged optimization reduce excessive space consumption problem faced multi-source concurrent factorization. Taken together, achieves up 31× speedup from 1 44 Summit nodes (6 264 GPUs) outperforms state-of-the-art CPU project, average, 5×. Notably, 47 percent peak memory throughput V100 Supercomputer.
منابع مشابه
Parallel Symbolic Factorization for Sparse LU Factorization with Static Pivoting
In this paper we consider a direct method to solve a sparse unsymmetric system of linear equations Ax = b, which is the Gaussian elimination. This elimination consists in explicitly factoring the matrix A into the product of L and U , where L is a unit lower triangular matrix, and U is an upper triangular matrix, followed by solving LUx = b one factor at a time. One of the main characteristics ...
متن کاملParallel Symbolic Factorization for Sparse LU with Static Pivoting
This paper presents the design and implementation of a memory scalable parallel symbolic factorization algorithm for general sparse unsymmetric matrices. Our parallel algorithm uses a graph partitioning approach, applied to the graph of |A|+ |A| , to partition the matrix in such a way that is good for sparsity preservation as well as for parallel factorization. The partitioning yields a so-call...
متن کاملUsing Postordering and Static Symbolic Factorization for Parallel Sparse LU
In this paper we present several improvements of widely used parallel LU factorization methods on sparse matrices. First we introduce the LU elimination forest and then we characterize the L, U factors in terms of their corresponding LU elimination forest. This characterization can be used as a compact storage scheme of the matrix as well as of the task dependence graph. To improve the use of B...
متن کاملSparse LU factorization on the CRAY T3D
The paper describes a parallel algorithm for the LU fac-torization of sparse matrices on distributed memory machines by using SPMD as programming model and PVM as message passing interface. We address all the diiculties arising in sparse codes, as the ll-in or the dynamic movement of data inside the matrix. The cyclic distribution has been used to evenly distribute the elements onto a mesh of p...
متن کاملElimination Forest Guided D Sparse LU Factorization
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high perfor mance for this problem is di cult on distributed memory machines Our previous work has developed an approach called S that incorporates static symbolic factorization supernode partitioning and graph scheduling This paper studies the properties of elimination forests and uses the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2022
ISSN: ['1045-9219', '1558-2183', '2161-9883']
DOI: https://doi.org/10.1109/tpds.2021.3090316